Automatic recognition of printed Farsi texts
نویسندگان
چکیده
-The automatic recognition of printed Farsi (Persian) texts is complicated by several properties of the Farsi script: (a) connectivity of symbols, (b) similarity of groups of symbols, (c) highly variable widths, (d) subword overlap, and (e) line overlap. In this paper, a technique for the automatic recognition of printed Farsi texts is presented and its steps are discussed as follows : (1) digitization, (2) editing, (3) line separation, (4) subword separation, (5) symbol separation, (6) recognition, and (7) postprocessing. The most notable contributions of this work are in algorithms for steps (5) and (6) above. Practical application of the technique to Farsi newspaper headlines has been 100% successful. However, smaller type fonts, which could not be handled by the coarse digitization hardware used, will no doubt result in less than perfect recognition. The technique is also applicable with little or no modification to printed Arabic and Urdu texts which use the same alphabet as Farsi. Character recognition Computer input Document input Optical character recognition Pattern recognition Persian Farsi Feature selection Printed text recognition
منابع مشابه
Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملAn HMM-based Farsi OCR
OCR (Optical Character Recognition) is the digital encoding of printed and handwritten characters from an image file created through a scanner or other optical imaging devices. In other words, OCR is a software program that converts image-texts into computerized or digital text (figure 1) . While OCR has been extensively used as the basic application of different learning methods in machine lea...
متن کاملSearch Space Reduction for Farsi Printed Subwords Recognition by Position of the Points and Signs
In the field of the words recognition, three approaches of words isolation, the overall shape and combination of them are used. Most optical recognition methods recognize the word based on break the word into its letters and then recogniz them. This approach is faced some problems because of the letters isolation dificulties and its recognition accurcy in texts with a low image quality. Therefo...
متن کاملOptical Character Recognition with a Neural Network Model for Printed Coptic Texts
Furthermore, historical texts are not passed down through the centuries in their entirety but rather contain lacunae and fragmentary words. This makes automatic post-correction more difficult on historical texts than on modern ones. We used two tools to create languageand even documentspecific recognition patterns (or so-called models) to recognize printed Coptic texts. Coptic is the last stage...
متن کاملA word spotting method for Farsi machine-printed document images
In this paper, a word spotting approach for Farsi printed document images has been presented. The main idea of the paper is the font recognition of Farsi document images and query word modification according to the document image’s font before searching. This operation increases the similarity between the query word image and its instances in the document image; therefore, the performance of th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Pattern Recognition
دوره 14 شماره
صفحات -
تاریخ انتشار 1981